feat: Demo mode #2523

Matvey-Kuk · 2024-11-18T11:24:49Z

LIVE_DEMO_MODE will launch what uset to be simulate_alert.py in the background

Closes #2524

✅ Closed in favor of #2561

Summary by CodeRabbit

New Features
- Introduced a new environment variable KEEP_LIVE_DEMO_MODE for simulating alerts in demo mode.
- Added functionality for simulating alerts through the enhanced demo_mode_runner module, including structured correlation rules and a robust service topology.
Bug Fixes
- Enhanced alert configurations across multiple providers to improve specificity and organization.
Documentation
- Updated deployment documentation to include the new KEEP_LIVE_DEMO_MODE environment variable.
Chores
- Simplified alert simulation logic by integrating with the new demo mode functionality.

vercel · 2024-11-18T11:24:54Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
keep	⬜️ Ignored (Inspect)	Visit Preview		Nov 20, 2024 9:04am

keep/api/core/demo_mode_runner.py

keep/api/config.py

docs/deployment/configuration.mdx

Co-authored-by: Tal <[email protected]> Signed-off-by: Matvey Kukuy <[email protected]>

talboren · 2024-11-19T08:15:55Z

keep/api/core/demo_mode_runner.py

+    logger.info("Demo mode launched.")
+
+    keep_api_url = "http://localhost:" + str(os.environ.get("PORT", 8080))
+    keep_api_key = os.environ.get("KEEP_API_KEY")


Suggested change

keep_api_key = os.environ.get("KEEP_API_KEY")

keep_api_key = os.environ.get("KEEP_READ_ONLY_BYPASS_KEY")

coderabbitai · 2024-11-19T11:32:59Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes in this pull request introduce a new environment variable KEEP_LIVE_DEMO_MODE to the Keep application, allowing for a demo mode that simulates alerts. The configuration documentation has been updated accordingly. The keep/api/config.py file has been modified to include a new global variable LIVE_DEMO_MODE and a function to manage demo mode activation. Additionally, various alert mock files across different providers have been updated to enhance alert configurations.

Changes

File Path	Change Summary
docs/deployment/configuration.mdx	Added new environment variable `KEEP_LIVE_DEMO_MODE` for demo mode configuration.
keep/api/config.py	Introduced global variable `LIVE_DEMO_MODE` and updated `on_starting` function for demo mode management.
keep/api/core/demo_mode_runner.py	Added functions for managing alerts and demo mode: `get_or_create_topology`, `get_or_create_correlation_rules`, `remove_old_incidents`, `simulate_alerts`, `launch_demo_mode`.
keep/providers/cloudwatch_provider/alerts_mock.py	Updated `ALERTS` dictionary for `high_cpu_usage` alert with new alarm names.
keep/providers/datadog_provider/alerts_mock.py	Modified `tags` for `high_cpu_usage` and `low_disk_space` alerts to include `service:api`.
keep/providers/grafana_provider/alerts_mock.py	Added `service` key to `database_connection_failure`, `high_memory_usage`, and `network_latency_high` alerts.
keep/providers/prometheus_provider/alerts_mock.py	Updated `labels.service` for multiple alerts and added `customer_id` to `mq_third_full` alert.
scripts/simulate_alerts.py	Simplified alert simulation by calling `simulate_alerts` function from `demo_mode_runner`.

Assessment against linked issues

Objective	Addressed	Explanation
Implement demo mode functionality	✅

🐰 In a world where alerts do play,
A demo mode brightens the day.
With variables new and logic refined,
Simulating alerts, oh how well aligned!
Hop along, let the testing commence,
In our lively demo, there's no pretense! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (10)

keep/providers/cloudwatch_provider/alerts_mock.py (1)
14-14: Consider namespace consistency with pod-related alerts.

The new alarm names are more descriptive and better reflect real-world scenarios. However, "PodRecycled" seems inconsistent with the AWS/EC2 namespace in the alert payload. If these alerts are meant to simulate Kubernetes pod scenarios, consider using a more appropriate namespace (e.g., ContainerInsights).

Consider this modification:
-            "Message.AlarmName": ["HighCPUUsage", "HighCPUUsageOnAPod", "PodRecycled"],
+            "Message.AlarmName": ["HighCPUUsage", "HighCPUUsageOnInstance", "InstanceRecycled"],
Or update the namespace if these are meant to be Kubernetes-related alerts:
             "MetricName": "CPUUtilization",
-            "Namespace": "AWS/EC2",
+            "Namespace": "ContainerInsights",
scripts/simulate_alerts.py (1)

Line range hint 1-31: Well-structured refactoring of alert simulation

The changes effectively:

Simplify the script by moving complex logic to a dedicated module

Provide configuration flexibility through environment variables

Follow separation of concerns principle

This aligns well with the PR objective of implementing demo mode.

keep/providers/datadog_provider/alerts_mock.py (1)

Line range hint 1-38: Consider documenting the alert variation logic

Since this mock data will be used by the new demo mode feature, it would be helpful to add comments explaining:

The purpose of parameter variations

How the demo mode selects between different parameter options

The relationship between payload defaults and parameter variations

This documentation would help maintainers understand the demo mode behavior.
keep/providers/prometheus_provider/alerts_mock.py (2)
23-23: Consider making customer_id configurable for demo flexibility

While "acme" is a good default, consider making the customer_id a parameter for more flexible demos. This would allow demonstrating multi-tenant scenarios more effectively.
"parameters": {
    "labels.queue": ["queue1", "queue2", "queue3"],
    "labels.service": ["calendar-producer-java-otel-api-dd", "kafka", "queue"],
    "labels.mq_manager": ["mq_manager1", "mq_manager2", "mq_manager3"],
+   "labels.customer_id": ["acme", "demo-corp", "test-org"],
},
Also applies to: 28-28

14-14: Consider reducing service list duplication

The same service list appears in multiple alerts. Consider defining it once as a constant to improve maintainability.
+COMMON_SERVICES = ["calendar-producer-java-otel-api-dd", "kafka", "api", "queue", "db"]

 ALERTS = {
     "high_cpu_usage": {
         # ...
         "parameters": {
-            "labels.service": ["calendar-producer-java-otel-api-dd", "kafka", "api", "queue", "db"],
+            "labels.service": COMMON_SERVICES,
Also applies to: 41-41, 54-54
keep/api/config.py (1)
62-73: Ensure idempotency of when_ready function.

The function should be idempotent as it might be called multiple times by the server. Consider adding a guard to prevent multiple initializations of demo mode.
+_demo_mode_initialized = False

 def when_ready(server):
     """This function is called by the gunicorn server when it is ready to accept requests"""
     logger.info("Keep server ready")

     launch_uptime_reporting()

-    if LIVE_DEMO_MODE:
+    global _demo_mode_initialized
+    if LIVE_DEMO_MODE and not _demo_mode_initialized:
         logger.info("Launching Keep in demo mode.")
         try:
             from keep.api.core.demo_mode_runner import launch_demo_mode
             launch_demo_mode()
+            _demo_mode_initialized = True
         except Exception as e:
             logger.error(f"Failed to launch demo mode: {e}")
     else:
         logger.info("Not launching Keep in demo mode.")
keep/providers/grafana_provider/alerts_mock.py (1)

Line range hint 4-97: Consider establishing service categorization standards

The addition of service categorization is a good enhancement, but there seems to be no clear standard for service naming. Consider:

Documenting the allowed service categories

Establishing naming conventions for services

Adding validation for service values

This would be particularly important for the demo mode feature to ensure consistent and realistic alert simulation.
keep/api/core/demo_mode_runner.py (3)
394-394: Fix the typo in the docstring.

There's a typo in the docstring: "backgound" should be "background".

Apply this diff to correct the typo:
     """
-    Running async demo in the backgound.
+    Running async demo in the background.
     """
399-400: Ensure the default port is correctly set as a string.

When constructing keep_api_url, ensure that the default port is a string to avoid issues during string concatenation.

Apply this diff to set the default port as a string:
 keep_api_url = os.environ.get(
-    "KEEP_API_URL", "http://localhost:" + str(os.environ.get("PORT", 8080))
+    "KEEP_API_URL", "http://localhost:" + os.environ.get("PORT", "8080")
 )
338-390: Add a top-level exception handler in the simulate_alerts loop.

To prevent the simulate_alerts thread from exiting unexpectedly due to unhandled exceptions, consider adding a top-level exception handler around the while True loop. This ensures that any unforeseen errors are logged and the loop continues running.

Apply this diff to add a top-level exception handler:
 def simulate_alerts(
     keep_api_url=None,
     keep_api_key=None,
     sleep_interval=5,
     demo_correlation_rules=False,
     demo_topology=False,
 ):
     logger.info("Simulating alerts...")
     GENERATE_DEDUPLICATIONS = True

     providers = [
         "prometheus",
         "grafana",
         "cloudwatch",
         "datadog",
     ]

     providers_to_randomize_fingerprint_for = [
         "cloudwatch",
         "datadog",
     ]

     provider_classes = {
         provider: ProvidersFactory.get_provider_class(provider)
         for provider in providers
     }

     if demo_correlation_rules:
         logger.info("Creating correlation rules...")
         get_or_create_correlation_rules(keep_api_key, keep_api_url)
         logger.info("Correlation rules created.")
     if demo_topology:
         logger.info("Creating topology...")
         get_or_create_topology(keep_api_key, keep_api_url)
         logger.info("Topology created.")

+    try:
         while True:
             logger.info("Looping to send alerts...")

             logger.info("Removing old incidents...")
             remove_old_incidents(keep_api_key, keep_api_url)
             logger.info("Old incidents removed.")

             # choose provider
             provider_type = random.choice(providers)
             send_alert_url = "{}/alerts/event/{}".format(keep_api_url, provider_type)
             provider = provider_classes[provider_type]
             alert = provider.simulate_alert()

             send_alert_url_params = {}

             if provider_type in providers_to_randomize_fingerprint_for:
                 send_alert_url_params["fingerprint"] = "".join(
                     random.choices("abcdefghijklmnopqrstuvwxyz0123456789", k=10)
                 )

             # Determine number of times to send the same alert
             num_iterations = 1
             if GENERATE_DEDUPLICATIONS:
                 num_iterations = random.randint(1, 3)

             for _ in range(num_iterations):
                 logger.info("Sending alert: {}".format(alert))
                 try:
                     env = random.choice(["production", "staging", "development"])
                     send_alert_url_params["provider_id"] = f"{provider_type}-{env}"
                     prepared_request = PreparedRequest()
                     prepared_request.prepare_url(send_alert_url, send_alert_url_params)
                     response = requests.post(
                         prepared_request.url,
                         headers={"x-api-key": keep_api_key},
                         json=alert,
                     )
                     response.raise_for_status()  # Raise an HTTPError for bad responses
                 except requests.exceptions.RequestException as e:
                     logger.error("Failed to send alert: {}".format(e))
                     time.sleep(sleep_interval)
                     continue

                 if not response.ok:
                     logger.error("Failed to send alert: {}".format(response.text))
                 else:
                     logger.info("Alert sent successfully")

             logger.info(
                 "Sleeping for {} seconds before next iteration".format(sleep_interval)
             )
             time.sleep(sleep_interval)
+    except Exception as e:
+        logger.exception("An unexpected error occurred in simulate_alerts: {}".format(e))

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b53e502 and 353e42c.

📒 Files selected for processing (8)

docs/deployment/configuration.mdx (1 hunks)
keep/api/config.py (2 hunks)
keep/api/core/demo_mode_runner.py (1 hunks)
keep/providers/cloudwatch_provider/alerts_mock.py (1 hunks)
keep/providers/datadog_provider/alerts_mock.py (2 hunks)
keep/providers/grafana_provider/alerts_mock.py (3 hunks)
keep/providers/prometheus_provider/alerts_mock.py (4 hunks)
scripts/simulate_alerts.py (2 hunks)

🔇 Additional comments (12)

scripts/simulate_alerts.py (3)

4-4: LGTM: Clean import of demo mode functionality

The import statement properly organizes demo-related functionality into a dedicated module.

21-26: Clarify the purpose of unused correlation rules feature

The demo_correlation_rules parameter is set to False but its purpose isn't clear. Consider:

Adding a comment explaining this parameter
Documenting if this is for future use

✅ Verification successful

The demo_correlation_rules parameter is a valid feature for creating demo correlation rules

The parameter is used to optionally create predefined correlation rules during alert simulation, as evidenced by:

Implementation in keep/api/core/demo_mode_runner.py shows a complete correlation rules feature
When enabled, it creates rules like "Message Queue Buckle Up" that groups MQ-related alerts
The parameter is intentionally set to False in the simulation script to avoid creating demo rules by default

Consider adding a docstring to explain this optional feature for other developers.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the implementation of simulate_alerts to understand correlation rules
ast-grep --pattern 'def simulate_alerts($$$)' 

# Search for any correlation rules implementation
rg 'correlation.*rules|rules.*correlation' -A 5

Length of output: 4263

16-18: Verify the default sleep interval value

The default sleep interval of 0.2 seconds might need validation:

Could be too aggressive for demo purposes
Might need adjustment based on typical alert patterns

✅ Verification successful

Let me gather more specific information about the alert simulation patterns.

Default sleep interval of 0.2 seconds is appropriate

The default sleep interval of 0.2 seconds in the alert simulation script is well-calibrated based on the codebase evidence:

Tests use shorter intervals (0.02s) for alert generation in controlled environments
Other system components use longer intervals (1s) for operational tasks
The 0.2s value provides a balanced rate for demo alert generation while avoiding overwhelming the system

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any existing sleep intervals in alert-related files for comparison
rg -g '!*.pyc' -g '!*.git' 'sleep.*interval|interval.*sleep'

Length of output: 4263

Script:

#!/bin/bash
# Check the alert simulation logic and timing patterns
rg "time\.sleep|SLEEP_INTERVAL" -B 2 -A 2

Length of output: 8768

keep/providers/datadog_provider/alerts_mock.py (1)

32-33: Review tag consistency between payload and parameters

There's a mismatch between the teams specified in the payload (team:database) and the parameters section:

Production uses team:analytics
Staging uses team:database

If this variation is intentional for demo purposes, consider documenting this. Otherwise, consider aligning the teams for consistency.

keep/providers/prometheus_provider/alerts_mock.py (2)

14-14: LGTM! Service labels expanded for better demo coverage

The addition of common service types (api, queue, db) provides a more comprehensive demo experience.

41-41: LGTM! Consistent service coverage

The service labels are consistent with other alerts and appropriate for disk space monitoring scenarios.

keep/api/config.py (3)

16-16: LGTM! Implementation follows the suggested convention.

The environment variable handling is properly implemented with a safe default and case-insensitive comparison.

58-60: Verify server state management between on_starting and when_ready.

The "Keep server started" message in on_starting followed by "Keep server ready" in when_ready might be confusing. Consider clarifying the server states or combining the messages.

66-67: Consider the order of operations.

The launch_uptime_reporting() is called before checking demo mode. Consider whether this ordering is intentional and if uptime reporting should be active in demo mode.

✅ Verification successful

Let me gather more information about the uptime reporting configuration and demo mode interaction.

Based on the results, I can now provide the final response:

The order of operations is correct and intentional

The uptime reporting is independently controlled by the POSTHOG_DISABLED environment variable and has its own checks. It will:

Only run if POSTHOG_DISABLED is false
Only launch if PostHog is reachable
Log appropriate messages about its status

The demo mode has no interaction with or impact on the uptime reporting functionality, so their relative order of initialization doesn't matter.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check uptime reporting behavior in demo mode

# Look for uptime reporting configuration and usage
rg -A 5 "launch_uptime_reporting"

# Check if demo mode affects uptime reporting
ast-grep --pattern 'launch_uptime_reporting($$$)'

Length of output: 1163

Script:

#!/bin/bash
# Check uptime reporting implementation and demo mode configuration
rg -A 10 "POSTHOG_DISABLED|LIVE_DEMO_MODE" keep/api/core/

# Check if there's any interaction between demo mode and uptime reporting
rg -A 5 "is_posthog_reachable|report_uptime" keep/api/core/demo_mode_runner.py

Length of output: 2299

keep/providers/grafana_provider/alerts_mock.py (2)

52-52: LGTM! Verify consistency with other memory alerts

The service categorization as "api" aligns well with the server monitoring context.

✅ Verification successful

Service categorization is consistent and appropriate

Looking at the alerts_mock.py file, the service categorization is consistent and properly applied:

The high memory usage alert monitors API servers (server1, server2, server3) and is correctly tagged with "service": "api"
Other alerts in the file also follow consistent service categorization:
- Database connection failures for API service are tagged with "service": "api"
- Network latency alerts for database components are tagged with "service": "db"

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for consistency in service categorization across memory-related alerts
rg -A 5 'memory.*service' --type py

Length of output: 646

Script:

#!/bin/bash
# Search for alert definitions containing "memory" in grafana provider
rg -B 5 -A 10 '"memory"' keep/providers/grafana_provider/

Length of output: 1336

Script:

#!/bin/bash
# Search for service field in alert definitions
rg -B 5 -A 10 '"service":' keep/providers/grafana_provider/

Length of output: 3717

Script:

#!/bin/bash
# Get the full content of alerts_mock.py to understand the context
cat keep/providers/grafana_provider/alerts_mock.py

Length of output: 5582

4-4: Consider revising the service categorization

The alert monitors database connections (as evidenced by "monitor": "database") but is categorized under the "api" service. This might cause confusion in alert routing and monitoring. Consider whether this alert should be categorized under the "db" service instead for better consistency with its monitoring target.

docs/deployment/configuration.mdx (1)

28-28: Documentation change looks good!

The new environment variable KEEP_LIVE_DEMO_MODE is well-documented and properly integrated into the configuration table. The description clearly explains its purpose for simulating alerts and activity, which aligns with the PR objectives.

keep/providers/datadog_provider/alerts_mock.py

keep/api/config.py

keep/providers/grafana_provider/alerts_mock.py

keep/api/core/demo_mode_runner.py

coderabbitai

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (1)

keep/api/core/demo_mode_runner.py (1)

25-55: Consider moving correlation rules to a configuration file.

While the rules are well-structured, having them hardcoded in the source code makes it less maintainable. Consider moving them to a separate configuration file (e.g., YAML or JSON) for easier maintenance and updates.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 353e42c and e333e9e.

📒 Files selected for processing (1)

keep/api/core/demo_mode_runner.py (1 hunks)

🔇 Additional comments (1)

keep/api/core/demo_mode_runner.py (1)

1-24: LGTM! Well-organized imports and logging setup.

The imports are properly organized and all necessary dependencies are included.

keep/api/core/demo_mode_runner.py

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)

keep/api/core/demo_mode_runner.py (2)
302-308: Add rate limiting for alert generation.

The alert simulation lacks rate limiting which could overwhelm the API endpoints.

Consider implementing a token bucket rate limiter:
+from keep.api.utils.rate_limiter import TokenBucketRateLimiter
+
 def simulate_alerts(
     keep_api_url=None,
     keep_api_key=None,
     sleep_interval=5,
     demo_correlation_rules=False,
-    demo_topology=False,
+    demo_topology=False,
+    requests_per_minute=60,
 ):
+    rate_limiter = TokenBucketRateLimiter(requests_per_minute)
335-336: Fix static analysis issues.

Two minor issues found:

Unused exception variable

Unnecessary f-string

Apply these fixes:
-        except requests.exceptions.RequestException as e:
+        except requests.exceptions.RequestException:
-            logger.info(f"Demo thread: API is not up yet. Waiting...")
+            logger.info("Demo thread: API is not up yet. Waiting...")
🧰 Tools

🪛 Ruff

335-335: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

336-336: f-string without any placeholders

Remove extraneous f prefix

(F541)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e333e9e and 2256347.

📒 Files selected for processing (1)

keep/api/core/demo_mode_runner.py (1 hunks)

🧰 Additional context used

🪛 Ruff

keep/api/core/demo_mode_runner.py

335-335: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

336-336: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (2)

keep/api/core/demo_mode_runner.py (2)

348-405: 🛠️ Refactor suggestion

Improve error handling in the simulation loop.

The infinite loop in simulate_alerts could benefit from better error handling and a graceful shutdown mechanism.

Consider:

Adding a shutdown flag
Implementing backoff strategy for failed requests
Adding proper cleanup on exit

+    shutdown_flag = threading.Event()
-    while True:
+    while not shutdown_flag.is_set():
         try:
             logger.info("Looping to send alerts...")

Likely invalid or redundant comment.

407-438: 🛠️ Refactor suggestion

Review thread configuration and sleep interval.

The demo mode initialization has several potential issues:

Using a daemon thread could lead to abrupt termination of ongoing operations
The sleep interval of 0.2 seconds might be too aggressive

Consider:

Making the sleep interval configurable via environment variable
Implementing proper thread cleanup
Using a non-daemon thread with proper shutdown handling

+    sleep_interval = float(os.environ.get("KEEP_DEMO_SLEEP_INTERVAL", "5.0"))
     thread = threading.Thread(
         target=simulate_alerts,
         kwargs={
             "keep_api_key": keep_api_key,
             "keep_api_url": keep_api_url,
-            "sleep_interval": 0.2,
+            "sleep_interval": sleep_interval,
             "demo_correlation_rules": True,
         },
     )
-    thread.daemon = True
+    # Register thread for cleanup on application shutdown
+    from keep.api.utils.thread_manager import register_cleanup_thread
+    register_cleanup_thread(thread)

Likely invalid or redundant comment.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)

keep/api/core/demo_mode_runner.py (1)

302-405: Improve alert simulation robustness and configurability.

The alert simulation could be enhanced in several ways:

Alert payload validation before sending
Configurable alert types and severity levels
Metrics collection for monitoring the demo mode

Consider adding alert validation and metrics:

+from prometheus_client import Counter, Histogram
+
+alerts_sent = Counter('demo_alerts_sent_total', 'Number of demo alerts sent', ['provider', 'status'])
+alert_duration = Histogram('demo_alert_duration_seconds', 'Time taken to send demo alerts')
+
 def simulate_alerts(...):
     while True:
         try:
             provider_type = random.choice(providers)
             alert = provider.simulate_alert()
+            
+            # Validate alert payload
+            if not provider.validate_alert(alert):
+                logger.error(f"Invalid alert payload for {provider_type}")
+                continue
+            
+            with alert_duration.time():
                 response = requests.post(...)
+                alerts_sent.labels(provider=provider_type, status='success').inc()
         except Exception as e:
+            alerts_sent.labels(provider=provider_type, status='error').inc()
             logger.exception("Error in simulate_alerts")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2256347 and baacc13.

📒 Files selected for processing (1)

keep/api/core/demo_mode_runner.py (1 hunks)

keep/api/core/demo_mode_runner.py

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between baacc13 and c1dd51a.

📒 Files selected for processing (1)

keep/api/core/demo_mode_runner.py (1 hunks)

🔇 Additional comments (1)

keep/api/core/demo_mode_runner.py (1)

348-405: ⚠️ Potential issue

Add graceful shutdown mechanism.

The infinite loop in simulate_alerts lacks a proper shutdown mechanism, which could make it difficult to stop the demo mode gracefully.

Add a shutdown flag:

+    shutdown_flag = threading.Event()
-    while True:
+    while not shutdown_flag.is_set():
         try:
             logger.info("Looping to send alerts...")

Likely invalid or redundant comment.

keep/api/core/demo_mode_runner.py

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)

keep/api/core/demo_mode_runner.py (2)
25-55: Consider adding type hints and documentation for correlation rules.

The correlation rules configuration would benefit from:

Type hints using a Pydantic model or dataclass

Documentation explaining the rule format and parameters
+from typing import List, Dict, Union
+from dataclasses import dataclass
+
+@dataclass
+class CorrelationRule:
+    """Defines a rule for correlating alerts based on SQL and CEL queries.
+    
+    Attributes:
+        sqlQuery: SQL query configuration for matching alerts
+        groupDescription: Description of the alert group
+        ruleName: Unique name of the rule
+        celQuery: CEL query for matching alerts
+        timeframeInSeconds: Time window for grouping alerts
+        timeUnit: Unit of time for the timeframe
+        groupingCriteria: Criteria for grouping alerts
+        requireApprove: Whether approval is required
+        resolveOn: When to resolve the alert group
+    """
+    sqlQuery: Dict[str, Union[str, Dict[str, str]]]
+    groupDescription: str
+    ruleName: str
+    celQuery: str
+    timeframeInSeconds: int
+    timeUnit: str
+    groupingCriteria: List[str]
+    requireApprove: bool
+    resolveOn: str
+
+correlation_rules_to_create: List[CorrelationRule] = [
     {
         "sqlQuery": {"sql": "((name like :name_1))", "params": {"name_1": "%mq%"}},
1-450: Consider architectural improvements for better maintainability.

The current implementation could benefit from:

Separation of concerns: Move configuration to separate files

Dependency injection: Extract HTTP client and make it injectable

Better testability: Extract core logic from I/O operations

Monitoring: Add metrics for alert simulation health

Consider:

Creating separate configuration modules for rules, topology, and applications

Implementing an abstract HTTP client interface

Adding health checks and metrics endpoints

Implementing proper logging and monitoring

Would you like me to provide a detailed architectural proposal?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c1dd51a and 1e83a5c.

📒 Files selected for processing (1)

keep/api/core/demo_mode_runner.py (1 hunks)

keep/api/core/demo_mode_runner.py

…e' into Matvey-Kuk/demo_mode

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)

keep/api/core/demo_mode_runner.py (1)
1-23: Consider organizing imports and adding type hints.

The imports could be better organized by grouping them into:

Standard library imports

Third-party imports

Local imports

Also, consider adding type hints to improve code maintainability.
 import datetime
 import logging
 import os
 import random
 import threading
 import time
-from datetime import timezone
+from datetime import timezone
+from typing import Dict, List, Optional
 
 import requests
 from dateutil import parser
 from requests.models import PreparedRequest

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1e83a5c and d3ece2a.

📒 Files selected for processing (3)

keep/api/config.py (2 hunks)
keep/api/core/demo_mode_runner.py (1 hunks)
keep/providers/prometheus_provider/alerts_mock.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

keep/providers/prometheus_provider/alerts_mock.py

🔇 Additional comments (5)

keep/api/config.py (3)

16-16: LGTM! Environment variable declaration follows best practices.

The implementation correctly follows the project's naming conventions and safely handles boolean conversion.

61-68: Verify the startup sequence order.

The current implementation launches uptime reporting before initializing demo mode. Let's verify if this order is intentional and if there are any dependencies between these operations.

✅ Verification successful

Let me gather more information about the implementation details of both functions to better understand their independence and potential interactions.

The startup sequence order is correct and safe.

The code analysis reveals that both functions are independent and can safely run in any order because:

launch_uptime_reporting() runs in its own thread and only handles PostHog telemetry reporting
launch_demo_mode() operates independently in a separate thread, managing demo-specific features like alert simulation and topology setup
Neither function shares critical resources or has dependencies on the other's initialization

The current sequence is logical as it ensures basic telemetry is set up before any demo features are initialized.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any dependencies between uptime reporting and demo mode

# Check if demo mode affects or depends on uptime reporting
rg -A 5 "launch_uptime_reporting|launch_demo_mode" --type py

# Check for any shared resources or configurations
ast-grep --pattern 'def launch_$_() {
  $$$
}'

Length of output: 1668

Script:

#!/bin/bash
# Check the implementation details of both launch functions
cat keep/api/core/report_uptime.py
cat keep/api/core/demo_mode_runner.py

# Check if there are any shared dependencies or configurations
rg "POSTHOG_DISABLED|is_posthog_reachable" --type py

Length of output: 17714

63-66: ⚠️ Potential issue

Add error handling for demo mode initialization.

The demo mode initialization should gracefully handle potential failures to prevent server startup issues.

Apply this diff to add error handling:

     if LIVE_DEMO_MODE:
         logger.info("Launching Keep in demo mode.")
-        from keep.api.core.demo_mode_runner import launch_demo_mode
-        launch_demo_mode()
+        try:
+            from keep.api.core.demo_mode_runner import launch_demo_mode
+            launch_demo_mode()
+        except ImportError as e:
+            logger.error(f"Failed to import demo mode module: {e}")
+            raise
+        except Exception as e:
+            logger.error(f"Failed to launch demo mode: {e}")
+            # Consider whether to continue or exit based on your requirements
+            raise

Likely invalid or redundant comment.

keep/api/core/demo_mode_runner.py (2)

468-469: LGTM!

The main entry point is simple and follows best practices.

25-54: Review SQL query construction for potential injection vulnerabilities.

The correlation rules use parameterized queries, which is good. However, consider:

Adding input validation for the parameters
Using an SQL query builder or ORM to construct queries safely

keep/api/core/demo_mode_runner.py

Matvey-Kuk · 2024-11-20T12:37:28Z

@coderabbitai resolve

coderabbitai · 2024-11-20T12:37:52Z

✅ Actions performed

Comments resolved.

Matvey-Kuk added 2 commits November 18, 2024 13:21

Demo mode

e5001e9

Docs

31a3e88

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 18, 2024

Matvey-Kuk changed the title ~~Demo mode~~ feat: Demo mode Nov 18, 2024

dosubot bot added Documentation Improvements or additions to documentation Feature A new feature labels Nov 18, 2024

Matvey-Kuk mentioned this pull request Nov 18, 2024

[👨🏻‍💻 Internal]: Demo mode #2524

Closed

Matvey-Kuk and others added 2 commits November 18, 2024 13:26

Merge branch 'main' into Matvey-Kuk/demo_mode

be886ff

Fix

c682491

talboren reviewed Nov 18, 2024

View reviewed changes

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved

talboren reviewed Nov 18, 2024

View reviewed changes

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved

Matvey-Kuk added 3 commits November 18, 2024 13:36

up

c63e469

minor

7daf2ce

fix

9b8c52b

Matvey-Kuk marked this pull request as draft November 18, 2024 11:42

Matvey-Kuk and others added 3 commits November 18, 2024 14:57

Read or Create topology

5bc2ed9

Fingerprint randomization

294a792

Merge branch 'main' into Matvey-Kuk/demo_mode

d045a28

Matvey-Kuk marked this pull request as ready for review November 18, 2024 14:52

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Nov 18, 2024

talboren reviewed Nov 18, 2024

View reviewed changes

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved

talboren reviewed Nov 18, 2024

View reviewed changes

keep/api/config.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview November 18, 2024 14:57 View deployment

conditional import

42cbb09

talboren reviewed Nov 18, 2024

View reviewed changes

keep/api/config.py Outdated Show resolved Hide resolved

talboren reviewed Nov 18, 2024

View reviewed changes

docs/deployment/configuration.mdx Outdated Show resolved Hide resolved

Matvey-Kuk and others added 2 commits November 18, 2024 18:11

Update keep/api/config.py

b34c1df

Co-authored-by: Tal <[email protected]> Signed-off-by: Matvey Kukuy <[email protected]>

Update docs/deployment/configuration.mdx

ac3b392

Co-authored-by: Tal <[email protected]> Signed-off-by: Matvey Kukuy <[email protected]>

Waiting for API to get live

3fe4071

talboren reviewed Nov 19, 2024

View reviewed changes

Matvey-Kuk and others added 4 commits November 19, 2024 10:58

when_ready

0e329b2

Fix

d96465b

fix: fix

0fe790d

fix: fix

353e42c

coderabbitai bot reviewed Nov 19, 2024

View reviewed changes

keep/providers/datadog_provider/alerts_mock.py Show resolved Hide resolved

keep/api/config.py Show resolved Hide resolved

keep/providers/grafana_provider/alerts_mock.py Show resolved Hide resolved

keep/api/core/demo_mode_runner.py Show resolved Hide resolved

fix: fix

e333e9e

coderabbitai bot reviewed Nov 19, 2024

View reviewed changes

keep/api/core/demo_mode_runner.py Show resolved Hide resolved

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved

Matvey-Kuk and others added 2 commits November 19, 2024 14:16

wait for the connection

2256347

fix: fix

baacc13

coderabbitai bot reviewed Nov 19, 2024

View reviewed changes

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved

keep/api/core/demo_mode_runner.py Show resolved Hide resolved

keep/api/core/demo_mode_runner.py Show resolved Hide resolved

fix: fix

c1dd51a

coderabbitai bot reviewed Nov 19, 2024

View reviewed changes

keep/api/core/demo_mode_runner.py Show resolved Hide resolved

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved

fix: timeout

1e83a5c

coderabbitai bot reviewed Nov 19, 2024

View reviewed changes

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved

Matvey-Kuk added 3 commits November 19, 2024 17:56

Providers

d11b44c

customer_id:acme

bfdd576

Merge remote-tracking branch 'refs/remotes/origin/Matvey-Kuk/demo_mod…

d3ece2a

…e' into Matvey-Kuk/demo_mode

coderabbitai bot reviewed Nov 19, 2024

View reviewed changes

keep/api/core/demo_mode_runner.py Show resolved Hide resolved

keep/api/core/demo_mode_runner.py Show resolved Hide resolved

keep/api/core/demo_mode_runner.py Show resolved Hide resolved

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved

Fix provider reading

5ce3629

Matvey-Kuk marked this pull request as draft November 20, 2024 08:34

Debug logs

ddcfcdf

Matvey-Kuk mentioned this pull request Nov 20, 2024

feat: Demo mode v2 & moving background tasks outside of the server #2561

Merged

Matvey-Kuk closed this Nov 20, 2024

talboren deleted the Matvey-Kuk/demo_mode branch November 26, 2024 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Demo mode #2523

feat: Demo mode #2523

Matvey-Kuk commented Nov 18, 2024 •

edited

Loading

vercel bot commented Nov 18, 2024 •

edited

Loading

talboren Nov 19, 2024

coderabbitai bot commented Nov 19, 2024 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

Matvey-Kuk commented Nov 20, 2024

coderabbitai bot commented Nov 20, 2024

	keep_api_key = os.environ.get("KEEP_API_KEY")
	keep_api_key = os.environ.get("KEEP_READ_ONLY_BYPASS_KEY")

feat: Demo mode #2523

feat: Demo mode #2523

Conversation

Matvey-Kuk commented Nov 18, 2024 • edited Loading

Summary by CodeRabbit

vercel bot commented Nov 18, 2024 • edited Loading

talboren Nov 19, 2024

Choose a reason for hiding this comment

coderabbitai bot commented Nov 19, 2024 • edited Loading

Review skipped

Walkthrough

Changes

Assessment against linked issues

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

Matvey-Kuk commented Nov 20, 2024

coderabbitai bot commented Nov 20, 2024

Matvey-Kuk commented Nov 18, 2024 •

edited

Loading

vercel bot commented Nov 18, 2024 •

edited

Loading

coderabbitai bot commented Nov 19, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)